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Abstract 

Consider the noisy underdetermined system of linear equations: y = Ax° + z , with n x N 
measurement matrix A, n < N, and Gaussian white noise z° ~ N(0, <r 2 I). Both y and A are 
known, both x n and z° are unknown, and we seek an approximation to x a . 

When x° has few nonzeros, useful approximations are often obtained by ^-penalized £2 min- 
imization, in which the reconstruction x 1 ^ solves min \\y — Ax^jl + A||x||i. 

Evaluate performance by mean-squared error (MSE = E||f 1,A — ||/^)- Consider matrices 
A with iid Gaussian entries and a large-system limit in which n, N — > 00 with n/N — > S and 
k/n — > p. Call the ratio MSE/ct 2 the noise sensitivity. We develop formal expressions for the 
MSE of a; 1 ^, and evaluate its worst-case formal noise sensitivity over all types of fc-sparse signals. 
The phase space < 5, p < 1 is partitioned by curve p = /?mse(i5) into two regions. Formal noise 
sensitivity is bounded throughout the region p < Pmse(^) and is unbounded throughout the region 
P > Pmse(S)- 

The phase boundary p = Pmse(^) is identical to the previously- known phase transition curve 
for equivalence of l\ — £ minimization in the fc-sparse noiseless case. Hence a single phase 
boundary describes the fundamental phase transitions both for the noiseless and noisy cases. 

Extensive computational experiments validate the predictions of this formalism, including the 
existence of game theoretical structures underlying it (saddlepoints in the payoff, least-favorable 
signals and maximin penalization). 

Underlying our formalism is an approximate message passing soft thresholding algorithm 
(AMP) introduced earlier by the authors. Other papers by the authors detail expressions for 
the formal MSE of AMP and its close connection to ^i-penalized reconstruction. Here we derive 
the minimax formal MSE of AMP and then read out results for ^-penalized reconstruction. 
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1 Introduction 



Consider the noisy underdetermined system of linear equations: 

y = Ax° + z°, (1.1) 

where the matrix A is n x N, n < N, the iV-vector x° is /c-sparse (i.e. it has at most k non-zero 
entries), and z° G 1" is a Gaussian white noise z° ~ N(0,<7 2 /). Both y and A are known, both x° 
and z° are unknown, and we seek an approximation to x°. 

A very popular approach estimates x° via the solution x 1 ' of the following convex optimization 
problem 

(P 2 ,a,i) minimize - \\y - Ax\\l + X\\x\\i. (1.2) 

Thousands of articles use or study this approach, which has variously been called LASSO, Basis 
Pursuit, or more prosaically, ^-penalized least-squares |Tib96| ICD951 fCDS98j . There is a clear need 
to understand the extent to which (-P2 A l) accurately recovers x°. Dozens of papers present partial 
results, setting forth often loose bounds on the behavior of x 1,x (more below). 



Even in the noiseless case z° = 0, understanding the reconstruction problem (1.1) poses a chal- 
lenge, as the underlying system of equations y = Ax° is underdetermined. In this case it is informative 
to consider l\ minimization, 

(Pi) minimize ||x||i , (1-3) 
subject to y = Ax. (1-4) 



This is the A = limit of (1.2): its solution obeys x 1,0 = lining x 1,x . 

The most precise information about behavior of x ,0 is obtained by large-system analysis; let 
n, N tend to infinity so thatj]] n ~ 5N and correspondingly let the number of nonzeros k ~ pn; 
thus we have a phase space < 5, p < 1 , expressing different combinations of undersampling 5 and 
sparsity p. When the matrix A has iid Gaussian elements, phase space < 5, p < 1 can be divided 
into two components, or phases, separated by a curve p = p^ x (5), which can be explicitly computed. 
Below this curve, x° is sufficiently sparse that x '° = x° with high probability and therefore i\ 
minimization perfectly recovers the sparse vector x '°. Above this curve, sparsity is not sufficient: 
we have x '° / x° with high probability. Hence the curve p = pi x (8), < 5 < 1, indicates the precise 
tradeoff between undersampling and sparsity. 

Many authors have considered the behavior of x 1 ' in the noisy case but results are somewhat 
less conclusive. The most well-known analytic approach is the Restricted Isometry Principle (RIP), 
developed by Candes and Tao |CT05l ICT07] . Again in the case where A has iid Gaussian entries, 
and in the same large-system limit, the RIP implies that, under sufficient sparsity of x°, with high 
probability one has stability bounds of the form — x°\\2 < C(8, p)\\z°\\2 log N. The region where 
C(5, p) < oo was orginally an implicitly known, but clearly nonempty region of the (5, p) phase space. 
Blanchard, Cartis and Tanner [BCT09J recently improved the estimates of C in the case of Gaussian 
matrices A, by careful large deviations analysis, and by developing an asymmetric RIP, obtaining the 
largest region where x 1,x is currently known to be stable. Unfortunately as they show, this region is 
still relatively small compared to the region p < pi 1 (5), < S < 1. 

It may seem that, in the presence of noise, the precise tradeoff between undersampling and 
sparsity worsens dramatically, compared to the noiseless case. In fact, the opposite is true. In this 



1 Here and below we write a ~ b if a/b -> 1 as both quantities tend to infinity. 



2 



paper, we show that in the presence of Gaussian white noise, the mean-squared error of the optimally 
tuned i\ penalized least squares estimator behaves well over quite a large region of the phase plane, 
in fact, it is finite over the exact same region of the phase plane as the region of l\ — £q equivalence 
derived in the noiseless case. 

Our main results, stated in Section [3j give explicit evaluations for the the worst-case formal 
mean square error of x 1,x under given conditions of noise, sparsity and undersampling. Our results 



indicate the noise sensitivity of solutions to (1.2), the optimal penalization parameter A, and the 



hardest-to-recover sparse vector. As we show, the noise sensitivity exhibits a phase transition in the 
undersampling-sparsity (<5, p) domain along a curve p = /9 M se(<5), and this curve is precisely the same 
as the £1-^0 equivalence curve p^ . 

Our results might be compared to work of Xu and Hassibi [XHQ9], who considered a different 
departure from the noiseless case. In their work, the noise z° was still vanishing, but the vector xo 
was allowed to be an £i-norm bounded perturbation to a /c-sparse vector. They considered stable 
recovery with respect to such small perturbations and showed that the natural boundary for such 
stable recovery is again the curve p = Pmse(S)- 

1.1 Results of our Formalism 

We define below a so-called formal MSE (fMSE), and evaluate the (minimax, formal) noise sensitivity: 

M*{5,p) = supmaxminfMSE(x 1 'V,cr 2 )/<7 2 ; (1.5) 
<t>0 v A 

here v denotes the marginal distribution of x° (which has fraction of nonzeros not larger than pS), 
and A denotes the tuning parameter of the ^i-penalized £2 minimization. Let M^(e) denote the 
minimax MSE of scalar thresholding, defined in Section 2 below. Let Pmse(8) denote the solution of 

M ± (p5) = 5. (1.6) 

Our main theoretical result is the formula 

f M ± {6 P ) / c\ 

M*(5,p) = { i=3^P' P<Pmse{0), ^ 
I 00, p > Pmse(S). 



Quantity ( 1.5 ) is the payoff of a traditional two-person zero sum game, in which the undersampling 
and sparsity are fixed in advance, the researcher plays against Nature, Nature picks both a noise 
level and a signal distribution, and the researcher picks a penalization level, in knowledge of Nature's 
choices. It is traditional in analyzing such games to identify the least-favorable strategy of Nature 
(who maximizes payout from the researcher), and the optimal strategy for the researcher (who 
wants to minimize payout). We are able to identify both and give explicit formulas for the so-called 
saddlepoint strategy, where Nature plays the least-favorable strategy against the researcher and the 



researcher minimizes the consequent damage. In Proposition 3.1 below we give formulas for this pair 



of strategies. The phase-transition structure evident in (1.7) is saying that above the curve Pmse, 



Nature has available unboundedly good strategies, to which the researcher has no effective response. 
1.2 Structure of the Formalism 

Our approach is presented in Section [4j and uses a combination of ideas from decision theory in 
mathematical statistics, and message passing algorithms in information theory. On the one hand, 
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Figure 1: Contour lines of the minimax noise sensitivity M*(6, p) in the (p, S) plane. The dotted black curve 
graphs the phase boundary (<5, p MSE (<5)). Above this curve, M*(5,p) = oo. The colored lines present level sets 
of M*(S,p) = 1/8, 1/4, 1/2, 1, 2, 4 (from bottom to top). 



as already evident from formula (1.7), quantities from mathematical statistics play a key role in 
our formulas. But since these quantities concern a completely different estimator in a completely 
different problem - the behavior of soft thresholding in estimating a single normal mean, likely to 
be zero - the superficial appearance of the formulas conceals the type of analysis we are doing. That 
analysis concerns the properties of an iterative soft thresholding scheme introduced by the authors 
in [DMM09aJ, and further developed here. Our formalism neatly describes properties of the formal 
MSE of AMP as expectations taken in the equilibrium states of a state evolution. As described in 
[DMMlObj, we can calibrate AMP to have the same operating characteristics as ^-penalized least 
squares, and by recalibration of the minimax formal MSE for AMP, we get the above results. 



1.3 Empirical Validation 

We use the word formalism for the machinery underlying our derivations because it is not (yet) a 
rigorously-proven method which is known to give correct results under established regularity con- 
ditions. In this sense our method has similarities to the replica and cavity methods of statistical 
physics, famously useful tools without rigorous general justification. 

Our theoretical results are validated here by computational experiments which show that the 
predictions of our formulas are accurate, and, even more importantly, that the underlying formal 
structure leading to our predictions - least-favorable objects, game-theoretic saddlepoints of the 
MSE payoff function, maximin tuning of A, unboundedness of the noise sensitivity above phase 
transition- can all be observed experimentally. Because our formalism makes so many different 
kinds of predictions about quantities with clear operational significance and about their dynamical 
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evolution in the AMP algorithm, it is quite different than some other formalisms, such as the replica 
method, in which many fewer checkable predictions are made. In particular, as demonstrated in 
[DMM09a], the present formalism describes precisely the evolution of an actual low complexity 
algorithm. 

Admittedly, by computational means we can only check individual predictions in specific cases, 
whereas a full proof could cover all such cases. However, we make available software which checks 
these features so that interested researchers can check the same phenomena at parameter values that 
we did not investigate here. The evidence of our simulations is strong; it is not a realistic possibility 
that I 1 -penalized least squares fails to have the limit behavior discovered here. 

We focused in this paper on measurement matrices A with Gaussian iid entries. It was recently 
proved that the state evolution formalism at the core of our analysis is indeed asymptotically correct 
for Gaussian matrices A [BM10] . We believe that similar results hold for matrices A with uniformly 
bounded iid entries with zero mean and variance 1/n. However our results should extend to a 
broader universality class including matrices with iid entries with same mean and variance, under 
an appropriate light tail condition. It is an outstanding mathematical challenge to prove that such 
predictions are indeed correct for a broader universality class of estimation problems. 

As discussed in Section [7j an alternative route also from statistical physics, using the replica 
method has been recently used to investigate similar questions. We will argue that the present 
framework which makes predictions about actual dynamical behavior of algorithms, is computation- 
ally verifiable in great detail, whereas the replica method itself applies to no constructive algorithm 
and makes comparatively many fewer predictions. 



2 Minimax MSE of Soft Thresholding 

We briefly recall notions from, e.g., [DJHS92, IDJ94j and then generalize them. We wish to recover 
an N vector x° = (x°(i) : 1 < i < N) which is observed in Gaussian white noise 

y (i) = x°(i) + z°(i), l<i<N, 

with z°(i) ~ N(0, a 2 ) independent and identically distributed. This can be regarded as special case 
of the compressed sensing model (1.1), whereby n = N and A = I is the identity matrix - i.e. there 
is no underdetermined system of equations. We assume that x° is sparse. It makes sense to consider 
soft thresholding 

x T (i) = n(y(i);ra), l<i<N, 
where the soft threshold function (with threshold level 9) is defined by 

{x - 9 if 9 < x, 
tt-0<x<0, (2.1) 

x + 9 ifx<-9. 

In words, the estimator ^ 'shrinks' the observations y towards the origin by a multiple r of the 
noise level a. 

In place of studying x° which are /c-sparse, |DJHS92l IDJ94j consider random variables X which 
obey F{X ^ 0} < s, where e = k/n. So let T e denote the set of probability measures placing all but 
e of their mass at the origin: 

T e = {v : v is probability measure with ^({0}) > 1 — e}. 
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We define the soft thresholding mean square error by 

mse{a 2 ;u,r) = e{ [rj(X + a • Z; to) - X) 2 } . (2.2) 

Here expectation is with respect to independent random variables Z ~ N(0, 1) and X ~ v. 

It is important to allow general a in calculations below. However, note to the scale invariance 

mse(cr 2 ; u, r) = cr 2 mse(l; u 1/a , r) , (2.3) 

where v a is the probability distribution obtained by rescaling v: v a (S) = v{{x : ax G S}). It 
follows that all calculations can be made in the a = 1 setting and results rescaled to obtain final 
answers. Below, when we deal with a = 1, we will suppress the a argument, and simply write 
mse(v, t) = mse(l; z/,r) 

The minimax threshold MSE was defined in [DJHS92, 1DJ94] by 

M ± {e) = inf sup mse(^r) . (2.4) 

(The superscript ± reminds us that, when the estimand X is nonzero, it may take either sign. In 
Section 6.1, the superscript + will be used to cover the case where X > 0). We will denote by r =t (e) 
the threshold level achieving the infimum. Figure [2] depicts the behavior of and as a function 
ofe. M ± {e) was studied in [DJ94J where one can find a considerable amount of information about 
the behavior of the optimal threshold and the least favorable distribution vf. In particular, the 
optimal threshold behaves as 



r ±(e) ~ y^log^" 1 ) , as e^O, 

and is explicitly computable at finite e. 

A peculiar aspect of the results in [D J94] requires us to generalize their results somewhat. For a 
given, fixed r > 0, the worst case MSE obeys 

sup mse(^,r) = e (1 + r 2 ) + (1 - e)[2(l + r 2 ) $(-r) - 2r <P(t)] , (2.5) 

with <f>{z) = exp(— z 2 /2)/v / 2~7r the standard normal density and &(z) = j^^^x) dx the Gaussian 
distribution. This supremum is "achieved" only by a three-point mixture on the extended real line 
RU {-00,00}: 

v* E = (1 - e)8 + -5oo + 2^-°°- 

We will need approximations which place no mass at 00. We say distribution u £ ^ a is a -least- favorable 
for rj( - ; r) if it is the least-dispersed distribution in T e achieving a fraction (1 — a) of the worst case 
risk for rj( ■ ; r), i.e. if both (i) 

mse(^ £ja , r (e)) = (1 — a) • sup mse(z^, r (e)) , 

and (m) ^ has the smallest second moment for which (i) is true. The least favorable distribution v £ ^ a 
has the form of a three-point mixture 

Ve,a = (1 - e) 5 + |<5 M ± (£iQi ) + ^_ M ±( £jQ ) . 
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Figure 2: Left: M (e) as a function of e; Right: T =t (e) as a function of e. 

Here /U (e, a) is an explicitly computable function, see below, and for a > fixed we have 

(^(e, a) ~ y / 21og(e~ 1 ) , as e — ^ . 

Note in particular the relatively weak role played by a. This shows that although the precise 
least-favorable situation places mass at infinity, in fact, an approximately least-favorable situation is 
already achieved much closer to the origin. 

3 Main Results 

The notation of the last section allows us to state our main results. 
3.1 Terminology 

Definition 3.1. (Large-System Limit). A sequence of problem size parameters n,N will be said 
to grow proportionally if both n, N — > oo while n/N — )■ S € (0, 1). 

Consider a sequence of random variables (W n> jv)j where n,N grow proportionally. Suppose that 
W n ^N converges in probability to a deterministic quantity Woo, which may depend on 5 > 0. Then 
we say that W n N has large-system limit Woo, denoted 

Woo = Is lim(W n , N ). 

Definition 3.2. (Large-System Framework). We denote by LSF (5, p, a, v) a sequence of problem 

indexed by problem sizes n, N growing proportionally: n/N — > 
5. In each instance, the entries of the n x N matrix A are Gaussian iid N(0, l/n) ; the entries of z° 
are Gaussian iid N(0,<t 2 ) and the entries of x° are iid v. 




instances (y, A,x°) n> N as per Eq. (1.1 
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Finding NLF n ;e = 1/10, t*=1.1403 a =.02 
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Figure 3: Illustration of a-least-favorable v. For e = 1/10, we consider soft thresholding with the 
minimax parameter r ± (e). We identify the smallest p such that the measure u £:fM = (1 — e)5q + + 
has mse(i/ £)M ,r*) > 0.98M ± (0.1) (i.e. the MSE is at least 98% of the minimax MSE). 



For the sake of concreteness we focus here on problem sequences whereby the matrix A has iid 
Gaussian entries. An obvious generalization of this setting would be to assume that the entries are 
iid with mean and variance \jn. We expect our result to hold for a broad set of distributions in 
this class. 



In order to match the /c-sparsity condition underlying (1.1) we consider the standard framework 
only for v G T$ p . 

Definition 3.3. (Observable). Let x denote the output of a reconstruction algorithm on problem 
instance (y,A,x°). An observable J is a function J(y, A,x° ,x) of the tuple (y, A,x°,x). 

In an abuse of notation, the realized values J ni N = J(y,A,x°,x) in this framework will also be 
called observables. An example is the observed per-coordinate MSE: 

MSE= — \\x-x°\\l. 

The MSE depends explicitly on x° and implicitly on y and A (through the reconstruction algorithm) . 
Unless specified, we shall assume that the reconstruction algorithm solves the LASSO problem ( |1.2[ ), 
and hence x 1,x = x. Further in the following we will drop the dependence of the observable on the 
arguments y, A, x°,x, and the problem dimensions n, N, when clear from context. 

Definition 3.4. (Formalism). A formalism is a procedure that assigns a purported large-system 
limit Formal(J) to an observable J in the LSF(<5, p, a, v). This limit in general depends on 5, p, a 2 , 
and v £ Fs p : Formal(J) = Formal(J; 5, p, a, v). 
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Thus, in sections below we will consider J = MSE(y, A, x°, x l,x ) and describe a specific formalism 
yielding Formal(MSE), the formal MSE (also denoted by EV1SE). Our formalism has the following 
character when applied to MSE: for each a 2 , 5, and probability measure v on M., it calculates a 
purported limit fMSE(<5, u, a). For a problem instance with large n,N realized from the standard 
framework LSF(<5, p, a, u), we claim the MSE will be approximately fMSE(5, u, a) . In fact we will 
show how to calculate formal limits for several observables. For clarity, we always attach the modifier 
formal to any result of our formalism: e.g., formal MSE, formal False Alarm Rate, formally optimal 
threshold parameter, and so on. 

Definition 3.5. (Validation). A formalism is theoretically validated by proving that, in the stan- 
dard asymptotic framework, we have 

Is lim( J n ,Ar) = Formal(J) 

for a class J of observables to which the formalism applies, and for a range o/LSF(5, p, a 2 , u). 

A formalism is empirically validated by showing that, for problem instances (y, A, x°) realized 
from LSF(5, p, a, v) with large N we have 

Jn.N ~ Formal( J; 5, p, a, v), 

for a collection of observables J € J and a range of asymptotic framework parameters (5, p, a, v) ; 
here the approximation ~ should be evaluated by usual standards of empirical science. 

Obviously, theoretical validation is stronger than empirical validation, but careful empirical val- 
idation is still validation. We do not attempt here to theoretically validate this formalism in any 
generality; see [BMlOj results in this direction. Instead we view the formalism as calculating pre- 
dictions of empirical results. We have compared these predictions with empirical results and found 
a persuasive level of agreement. For example, our formalism has been used to predict the MSE of 



reconstructions by (1.2), and actual empirical results match the predictions, i.e.: 

^P^-^lll^fMSE^p,^). 



3.2 Results of the Formalism 

The behavior of formal mean square error changes dramatically at the following phase boundary. 

Definition 3.6 (Phase Boundary). For each 5 G [0, 1], let /? M se(<5) be the value of p solving 

M ± {p5) = 5. (3.1) 

It is well known that M =t (e) is monotone increasing and concave in e, with M (0) = and 
M (1) = 1. As a consequence, p M sv is also a monotone increasing function of S, in fact /0 M se(<5) — > 
as 5 — > and /j M se(<5) — > 1 as 5 — > 1. An explicit expression for the curve (5, Pmse($)) is provided in 
Appendix [Aj 

Proposition 3.1. Results of Formalism. The formalism developed below yields the following 
conclusions. 
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I. a In the region p < Pmse(°~), the minimax formal noise sensitivity obeys the formula 
In particular, M* is finite throughout this region. 



l.b With a 2 the noise level in J 1. ly , define the formal noise-plus interference level fNPI = fNPI(r; 5, p, a, v) 

fNPI = cr 2 + fMSE/5, 

and its minimax value NPI* (5, p; a) = a 2 • (1 + M* (5, p)/S). Fora>0, define 

p* (5, p; a) = p ± (5p, a) • 7nPP(M 

In LSF(5, p, a, v) let v £ J^sp place fraction 1 — 5p of its mass at zero and the remaining mass 
equally on ±p*(6, p;a). This v is a-least-favorable: the formal noise sensitivity of x 1,x equals 
(1 - a)M*{5,p), with (1 - a) = (1 - a)(l - M ± (5p))/{1 - (1 - a)M ± (8p)). 

l.c The formally maximin penalty parameter obeys 



X*(u;S,p,a) = T ± (6p) • V£NPI(r±; 5, p, a, u) ■ (1 - EqDR(z/; t ± (6 P ))/5) , 

where EqDR( • • • ) is the asymptotic detection rate, i.e. the asymptotic fraction of coordinates 
that are estimated to be nonzero. (An explicit expression for this quantity is given in Section 



43) 

In particular with this v -adaptive choice of penalty parameter, the formal MSE of x 1,x does not 
exceed M* ■ a 2 . 

2 In the region p > /5msb(<5); the formal noise sensitivity is infinite. Throughout this phase, for each 
fixed number M < oo, there exists a > such that the probability distribution v G T§ p placing 
its nonzeros at ±p*(5, p,a), yields formal MSE larger than M. 

We explain the formalism and derive these results in Section [4] below. 
3.3 Interpretation of the Predictions 

Figure [l] displays the noise sensitivity; above the phase transition boundary p = Pmse(^), it is infinite. 
The different contour lines show positions in the 5, p plane where a given noise sensitivity is achieved. 
As one might expect, the sensitivity blows up rather dramatically as we approach the phase boundary. 

Figure [4] displays the least-favorable coefficient amplitude p*(S, p,a = 0.02). Notice that p*(5, p, a) 
diverges as the phase boundary is approached. Indeed beyond the phase boundary arbitrarily large 
MSE can be produced by choosing p large enough. 

Figure [5| displays the value of the optimal penalization parameter amplitude A* = A*(f| p ; 5, p,a = 
1). Note that the parameter tends to zero as we approach phase transition. 

For these figures, the region above phase transition is not decorated, because the values there are 
infinite or not defined. 
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Figure 4: Contour lines of the near-least-favorable signal amplitude p,*(5,p,a) in the (/?, 5) plane. The 
dotted line corresponds to the phase transition (<5, Pmsb(^)), while the colored solid lines portray level sets of 
p*{5, p, a). The 3-point mixture distribution (1 — e)So + |rJ p + §5_ M , (e = Sp) will cause 98% of the worst-case 
MSE. When a fc-sparse vector is drawn from this distribution, its nonzeros are all at 
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Figure 5: Contour lines of the maximin penalization parameter: X*(5, p) in the (p, 5) plane. The dotted line 
corresponds to the phase transition (<5, Pmse(<5)), while thin lines are contours for A*(<5, p, a). Close to phase 
transition, the maximin value approaches 0. 
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3.4 Comparison to other phase transitions 



In view of the importance of the phase boundary for Proposition 3.1, we note the following: 



Finding 3.1. Phase Boundary Equivalence. The phase boundary Pmse is identical to the phase 
boundary pi x below which l\ minimization and £q minimization are equivalent. 

In words, throughout the phase where i\ minimization is equivalent to £q minimization, the solu- 



tion to (1.2) has bounded formal MSE. When we are outside that phase, the solution has unbounded 



formal MSE. The verification of Finding 3.1 follows in two steps. First, the formulas for the phase 
boundary discussed in this paper are identical to the phase boundary formulas given in [DMM09bJ; 
Second, in [DMM09b it was shown that these formulas agree numerically with the formulas known 
for p £l . 



3.5 Validating the Predictions 



Proposition 3.1 makes predictions for the behavior of solutions to (1.2). It will be validated empiri 



cally, by showing that such solutions behave as predicted. 

In particular, simulation evidence will be presented to show that in the phase where noise sensi- 
tivity is finite: 



1. Running (1.2) for data (y,A) generated from vectors xo with coordinates with distribution v 
which is nearly least-favorable results in an empirical MSE approximately equal to M*(S, p)-o~ 2 . 



2. Running (1.2) for data (y,A) generated from vectors xq with coordinates with distribution u 
which is far from least-favorable results in empirical MSE noticeably smaller than M* (5, p)-o~ 2 . 



3. Running (1.2) with a suboptimal penalty parameter A results in empirical MSE noticeably 
greater than M*(5,p) ■ a 2 . 

Second, in the phase where formal MSE is infinite: 



4. Running (1.2) on vectors xq generated by formally least-favorable results in an empirical MSE 



which is very large. 
Evidence for all these claims will be given below. 



4 The formalism 

4.1 The AMPT Algorithm 

We now consider a reconstruction approach seemingly very different from (i"2.A,i)- This algorithm, 
called first-order approximate message passing (AMP) algorithm proceeds iteratively, starting at 
x° = and producing the estimate x l of x° at iteration t according to the iteration: 

n 

x t+1 = + x*; t ) , (4.2) 

Here x l E W is the current estimate of x°, and df^ = ||x'||o is the number of nonzeros in the current 
estimate. Again rj( ■ ; • ) is the soft threshold nonlinearity with threshold parameter Ot 

9t = r- a t ; (4.3) 
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r is a tuning constant, fixed throughout iterations and at is an empirical measure of the scale of the 
residuals. Finally z £ W 1 is the current working residual. Compare with the usual residual defined 
by r* = y — Ax 1 via the identity z t = r l + z*" 1 ^ 1 . The extra term in AMP plays a subtle but crucial 
role. 

4.2 Formal MSE, and its evolution 

Let npi(m; a, 5) = a 2 + m/5. We define the MSE map ^ through 

ty(m, 5, a, r, v) = mse(npi(m, a, 5); u, r) , (4.4) 
where the function mse( ■ ;v,t) is the soft thresholding mean square error already introduced in 



Eq. (2.2). It describes the MSE of soft thresholding in a problem where the noise level is ynpi- A 



heuristic explanation of the meaning and origin of npi will be given below. 

Definition 4.1. State Evolution. The state is a 5-tuple (m;5,a,r,u). State evolution is the 
evolution of the state by the rule 

(mt;6,a,T,i/) (->■ (^(m t );S,a,T,u), 

t H- t + 1. 

As the parameters (5, a, r, v) remain fixed during evolution, we usually omit mention of them and 
think of state evolution simply as the iterated application of 

m t h-> m t +i = *(m t ), 

t H- t+1. 

Definition 4.2. Stable Fixed Point. The Highest Fixed Point of the continuous function ^ is 

HFP(^) = sup{m : $?(m) > m}. 
The stability coefficient of the continuously differ entiable function ^ is 

SC(tf) = — *(m) 

d?Tt m=HFP(*) 

We say that HFP(^) is a stable fixed point if < SC(^) < 1. 

To illustrate this, Figure [6] shows the MSE map and fixed points in three cases. 

In what follows we denote by /^(z^) = / x 2 dv the second- moment of the distribution v. 

Lemma 4.1. Let ■ ) = • , 8, a, t, v), and assume either a 2 > or ^(v) > 0. Then the sequence 
of iterates mt defined by rrit+i = ^>{mt) starting from mo = A i 2( z/ ) converges monotonically to 
HFP(^): 

m t -> HFP(^), t oo. 



2 A similar-looking algorithm was introduced by the authors in DMM09a , with identical steps (4.2 l-(4.1 1; it differed 



only in the choice of threshold; instead of a tuning parameter r like in (4.3 1 - one that can be set freely - a fixed choice 
t(<5) was made for each specific 8. Here we call that algorithm AMPM - M for minimax, as explained in DMM09b . 
In contrast, the current algorithm is tunable, allowing choice of r, we label it AMPT(r), T for tunable. 
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0.2 
MSEin 



0.5 1 
MSE in 



Figure 6: MSE Map \& in three cases, and associated fixed points. Left: 5 = 0.25, p = p MSE /2, a = 1, 
v = i/*(6,p,a) Center: 5 = 0.25, p = p M sE x 0.95, a = 1, v = v*(5,p,a) Right: 5 = 0.25, p = p M sE, 
a = 1, v = v*{b~, p, a) 



Further, if a > then HFP(^) £ (0,oo) is the unique fixed point. 

Suppose further that the stability coefficient satisfies < SCf^) < 1. Then there exists a constant 
A{v, \E r ) such that 

\m t - HFP(^)| < A(u, SC(*)* . 
Finally, if P2( l/ ) > HFP('I') then the sequence {mt} is monotonically decreasing to p-2{v) with 

(m t - HFP(^)) < SC(*)* • {p 2 {v) - HFP(^)). 

In short, barring the trivial case x = 0, z° = (no signal, no noise), state evolution converges 
to the highest fixed point. If the stability coefficient is smaller than 1, convergence is exponentially 
fast. 



Proof (Lemma J^.l). This Lemma is an immediate consequence of the fact that m i— > ^(m) is a 



concave non-decreasing function, with ^(0) > as long as a > and ^(0) = for a = 0. 

Indeed in [DMM09b the authors showed that at noise level a = 0, the MSE map m —> 
\P(m; 5, a, v, r) is concave as a function of m. We have the identity 

^(m; 5, a, v, r) = ^(m + a 1 ■ 5; 5, a = 0, v, r), 

relating the noise- level MSE map to the noise- level a MSE map. From this it follows that ^ is 
concave for a > as well. Also, |DMM09b| shows that = 0; 5, a = 0, v,r) = and = 

0; 5, a = 0, v, r) > 0, whence ^>(m = 0; <5, a, v,t) > for any positive noise level a. □ 
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In the same paper [DMM09bJ, the authors derived the least-favorable stability coefficient in the 
noiseless case a = 0: 

SC*(5,p,a = 0) = sup SC(^(-;d,a = 0,v,r)). 

They showed that, for M ± (8, p) < 5 the only fixed point is at m = and has stability coefficient 

SC* {S,p, a = 0) = M ± (Sp)/S. 

Hence, it follows that SC*(e), p, a = 0) < 1 throughout the region p < /9 MSE (<5). 
Define 

SC*(5,p) =sup sup SC(p(-; 6, <t,u,t)). 

<t>0 v&Fsp 

Concavity of the noise level MSE map implies 

sc*(M = sc*(5,p,fT = o)). 

We therefore conclude that throughout the region p < p M se(o~) For this reason, that region can also 
be called the stability phase, not only the stability coefficient is smaller than 1, SC( X I / ) < 1, but that 
it can be bounded away from 1 uniformly in the signal distribution v. 

Lemma 4.2. Throughout the region p < Pmse(S), < 5 < 1, for every v £ Fs p , we have SC(^) < 
SC*(<5,p)<l. 

Outside the stability region, for each large m, we can find measures v obeying the sparsity 
constraint v £ J-$ p for which state evolution converges to a fixed point suffering equilibrium MSE > 
m. The construction in section 4.5 shows that HFP( V &) > p2(v) > m. Figure [7] shows the MSE map 
and the state evolution in three cases which may be compared to [6j In the first case, p is well below 
Pmse and the fixed point is well below P2(v)- In the second case, p is slightly below p M sE and the 
fixed point is close to P2^)- In the third case, p is above p M sE and the fixed point, lies above P2(y)- 

P2{v) is the MSE one suffers by 'doing nothing': setting threshold A = oo and taking x = 0. 
When HFP('I') > P2(v), one iteration of thresholding makes things worse, not better. In words, 
the phase boundary is exactly the place below which we are sure that, if P2iy) is large, a single 
iteration of thresholding gives an estimate x 1 that is better than the starting point x°. Above the 
phase boundary, even a single iteration of thresholding may be a catastrophically bad thing to do. 

Definition 4.3. (Equilibrium States and State-Conditional Expectations) 

Consider a real-valued function ( : M 3 i— > its expectation in state S = (m; 5, a, v) is 

£(C\S) = E {((X, Z, V (X + Z; rv^pi))} , 

where npi = npi(m;<r, 5) and X ~ v, Z ~ N(0, 1) are independent random variables. 

Suppose we are given (5, a, v, r), and a fixed point in* , m* = HFP^) with ^ = \P( • ; 5, a, v, r). 
The tuple S* = (m*; 5, a, u) is called the equilibrium state of state evolution. The expectation in the 
equilibrium state is £(C\S*). 

Definition 4.4. (State Evolution Formalism for AMPT) . Run the AMPT algorithm and 
assume that the sequence of estimates converges to the fixed point (x 00 ,^ 00 ). To each function 

£ : M 3 i—)- M associate the observable 

1 N 

J<(y, A, x°, x) = - C{x°(i),A T z(i) + x(i) - x°(i),x(i)) . 
i=i 
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p=0.1 3,t=1 .54,u=3.69 p=0.26,T=1 .30,M=3.49 p=0.40,T=1 .1441=3.37 

0.5 1 1 1 0.9 I 1 1 1.4 1 1 1 




0.5 1 0.5 1 0.5 1 1.5 

MSE in MSE in MSE in 

Figure 7: Crossing the phase transition: effects on MSE Map ^, and associated state evolution. Left: 
5 = 0.25, p = Pmse/2, <t = 1,v= i/(5,p,0.01) Middle: 5 = 0.25, p = 0.9-p MSE , a = 1, v = u(S, p, 0.01) 
Right: 5 = 0.25, p = 1.5 • p M sv, o~ = 1, v = v{8, p, 0.01). In each case r = T =t (5p). 



Let S* denote the equilibrium state reached by state evolution in a given situation (5, a, u, r) . The 
state evolution formalism assigns the purported limit value 

Formal(J c ) = £((\S*). 



Validity of the state evolution formalism for AMPT entails that, for a sequence of problem 

n,N 



instances (y,A,x°) drawn from LSF(<5, p, a, v), the large-system limit for observable jf N is simply 



the expectation in the equilibrium state: 

Is lim4 jJV = £(C|S*). 

The class J of observables representable by the form is quite rich, by choosing £(u, v, w) 
appropriately. Table [T] gives examples of well-known observables and the £ which will generate them. 
Formal values for other interesting observables can in principle be obtained by combining such simple 
ones. For example, the False Discovery rate FDR is the ratio FDeR/DR and so the ratio of two 
elementary observables of the kind for which the formalism is defined. We assign it the purported 
limit value 

Formal(FDeR) 
For„al(FDR) = . 

Below we list a certain number of observables for which the formalism was checked empirically and 
that play an important role in characterizing the fixed point estimates. 

Calculation of Formal Operating Characteristics of AMPT(t) by State Evolution 
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Name 


Abbrev. 


C = C(u,v,w) 


Mean Square Error 
False Alarm Rate 
Detection Rate 
Missed Detection Rate 
False Detection Rate 


MSE 

FAR 

DR 

MDR 

FDeR 


C = (u - w)' 2 

C = 1{w^o&u=o}/(1 - f>8) 

C = l{w^0} 

C = l{w=0&u^0}/(P^) 

C = l{u^0&u=0}/(P<5) 



Table 1: Some observables and their names. 



Given 5, a, u, r, identify the fixed point HFP(^( ■ ; 5, a, u, r). Calculate the following quantities 

— Equilibrium MSE 

EqMSE = m oc = HFP(^( ■ ; v, r); 5, a). 

— Equilibrium Noise Plus Interference Level 

1 2 

npi^ = -moo + o- 
o 

— Equilibrium Threshold (absolute units) 

— Equilibrium Mean Squared Residual. Let = X + yj npi^ Z for X ~ v and Z ~ N(0, 1) 
are independent. Then 

EqMSR = E{ [Foo - ^Y^O^)] 2 } . 

— Equilibrium Mean Absolute Estimate 

EqMAE = E{|r ? (Y 00 ;fl 00 )|}. 

— Equilibrium Detection Rate 

EqDR = P{7 7 (Y oo ;0 oo ) / 0} . (4.5) 

— Equilibrium Penalized MSR 

EqPMSR = EqMSR/2 + 9^ • (1 - EqDR/5) • EqMAE. 

4.3 AMPT - LASSO Calibration 

Of course at this point the reader is entitled to feel that the introduction of AMPT is a massive 
digression. The relevance of AMPT is indicated by the following conclusion from [DM MlOb] : 

Finding 4.1. In the large system limit, the operating characteristics o/AMPT(r) are equivalent to 
those o/LASSO(A) under an appropriate calibration r f-> A. 
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By calibration, we mean a rescaling that maps results on one problem into results on the other 
problem. The notion is explained at greater length in [DMMlObj. The correct mapping can be 
guessed from the following remarks: 

LASSO(A): no residual exceeds A: ||^4 T (y — Ax 1,A )||oo < A. Further 

z}' A >0 & (A T (y-Ax 1 ' X )) i = X, 
x]' x = & \(A T (y - Ax 1,x ))i\ < X, 
x]' x <0 ^ (A T (y — Ax 1,x ))i = —X . 

• AMPT(r): At a fixed point x°°, z°°, no working residual exceeds the equilibrium threshold Boo'. 
P T 2°°||oo < #00 • Further 

xf>0 & (A T z 00 ) i = 6 00 , 
x? = \{A T z°°)i\ < 0OO, 
x°°<0 & (A T z 00 ) i = -6 00 . 

Define df = #{i : xf / 0}. Further notice that at the AMPT fixed point (l-df/n)z°° = y- A T i 
We can summarize these remarks in the following statement 



.00 



Lemma 4.3. Solutions x 1,x o/LASSO(A) (i.e. optima of the problem (1.2)) are in correspondence 
with fixed points (x°°, z°°) of the AMPT(r) under the bisection x°° = x 1,x , z°° = (y — A T x l > x ) / (\ — 
df/n), provided the threshold parameters are in the following relation 

X = 0oo ■ (1 - df/n) . (4.6) 

In other words, if we have a fixed point of AMPT(r) we can choose A in such a way that this 
is also an optimum of LASSO(A). Viceversa, any optimum of LASSO(A) can be realized as a fixed 



point of AMPT(r): notice in fact that the relation (4.6) is invertible whenever df < n. 

This simple rule gives a calibration relationship between r and A, i.e. a one-one correspondence 
between r and A that renders the two apparently different reconstruction procedures equivalent, 
provided the iteration AMPT(t) converges rapidly to its fixed point. Our empirical results confirm 
that this is indeed what happens for typical large system frameworks LSF(<5, p, a, v). 

The next lemma characterizes the equilibrium calibration relation between AMP and LASSO. 

Lemma 4.4. Let EqDR(r) = EqDR(r; 5, p, u, a) denote the equilibrium detection rate obtained 
from state evolution when the tuning parameter of AMPT is r. Define t°(S, p,u,a) > 0, so that 
EqDR(r) < 5 when r > r . For each A > 0, there is a unique value t(A) G [ro, 00) such that 

A = Mr) ■ (1 - EqDR(r)/*). 
We can restate Finding |4.1| in the following more convenient form. 



Finding 4.2. For each A G [0, 00) we find that AMPT(r(A)) and LASSO(A) have statistically equiv- 
alent observables. In particular the MSE ; MAE, MSR, DR, have the same distributions. 
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Figure 8: Contour lines of r*(S,p) in the (p, 6) plane. The dotted line corresponds to the phase transition 
(^iPmse(^)), while thin lines are contours for r*(5,p) 



4.4 Derivation of Proposition 3.1 



Consider the following Minimax Problem for AMPT(r). With fMSE(r; 5, p, a, v) denoting the equi- 
librium formal MSE for AMPT(t) for the framework LSF(<5, p, a, is), fix a = 1 and define 

M b (5, p) = inf sup fMSE(r; 5, p, a = 1, v). (4.7) 



We will first show that this definition obeys the formula just like the one in Proposition 3.1 given 
for M*. Later we show that M b = M*. 



Proposition 4.1. For M defined by (4-1), 
The A MPT threshold rule 

T*(5,p) = T ± (5p), 0<p<p MSE (S), (4.9) 

minimaxes the formal MSE: 

sup fMSE(r*; 5, p, 1, v) = inf sup fMSE(r; 5, p, 1, v) = M b (5, p). (4.10) 



Figure [8] depicts the behavior of r* in the (5,p) plane. 
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Proposition \4.1\ Consider v G J 7 ^ and a 2 = 1 and set r*(5,p) = T^(5p) as in the statement. Let for 
short ^{m;v) = $>(m,5,cr = 1,t* ,v) = mse(npi(m, 1, <5); v, r*), cf. Eq. (TO}. Then m* = HFP(^) 
obeys, by definition of fixed point, 

m* = fy(m*; v) . 

We can use the scale invariance mse(cr 2 ; v, r*) = mse(l; P, r*), where P is a rescaled probability 
measure, v\x ■ a G 5} = z^{x G For v G ^Sp, we have v G J^p as well and we therefore obtain 

m* = mse(npi(m*, 1, 5); v, r*) = mse(l; z>, r*) • npi(m*, 1, <5) < M ± ((5 / o) • npi(m*; 1, <5) , 

where we used the fact that T*(S,p) = ^(Sp). Hence 

—^——<M ± {5p). 
np\(m*;l,5) ~ v y ' 

The function m i— > ^Umtf l) * s one_ to-one strictly increasing on the interval [0,5). Thus, provided 
that 1 — M ± (5p)/5 > 0, i.e. p < /? M se> we have 

* < M±(6p) 
m < 



l-M ± (5p)/5' 

As this inequality applies to any HFP produced by our formalism, in particular the largest one 
consistent with v G F$ p , we have 

sup MSE(T*;6,p,l,v) < 
vGFtp l-M ± {5p)/5 

We now develop the reverse inequality. To do so, we make a specific choice V of v. Fix a > 
small. Now for e = 5p, define £ = /^(e, a)VNPP, where NPI* = l+M b /5 (with M b = M ± (5p)/(1- 
M ± (5p)/5) as in the thesis). Let V = (1 - e)5 + (e/2) + (e/2)<%. Denote by m* = m*(z7) the 
highest fixed point corresponding to the signal distribution v. Using once again scale invariance, we 
have 

m* = mse(npi(m*, 1, 5);u, r*) = mse(l; v,r*) ■ npi(m* , 1,5), (4-11) 

where z/ is again a rescaled probability measure, this time with v{x- \J npi(m*, 1, 5) G B} = V{x G -B}. 
Now since m* < M , we have npi(m*, 1,5) < NPP, and hence 



£ ±/ x / NPI * ±, ^ 

— = a (e,a)-\ — — rr- > u le, a). 

Vnpi(m*,l,5) y npi(ro*,l,<5) p v ; 

Note that mse(m; (1 — e)5o + (s/2)5- x + (e/2)5 x ,r) is monotone increasing in \x\. Recall that 
v e,a = (1 — z)5o + (e/2)5_p±( £ a ) + (£/2)<5p±( £iQ ,) is a-least favorable for the minimax problem (2.4). 
Consequently, 

mse(l; v, r*) > mse(l; us p>a , t*) = (1 — a) ■ M ± (5, p) . 
Using the scale-invariance relation, Eq. ( 4.11| ), we conclude that 

771 r±. 



npi(m*; 5, 1) 



> (1- a) ■ M ± (5p) 
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Again, in the region p < Pmse(S), the function m h-> npi (™.g n ^ s one-to-one and monotone and 
therefore so 

MSE(T ,,, 1 .^ 1 _^ ) ^ )/ , . 

As a > is arbitrary, we conclude 

sup MSE(r*;S,p,l,u) > ^S^yg . 
^e^ P l-M±(op)/o 

□ 

We now explain how this result about AMPT leads to our claim for the behavior of the LASSO 
estimator x 1,x . By a scale invariance the quantity (1.5) can be rewritten as a fixed-scale a = 1 
property: 

M*(S,p)= sup inffMSE(>,A| LASSO), 

where we introduced explicit reference to the algorithm used, and dropped the irrelevant arguments. 
We will analogously write fMSE(i/, r|AMPT) for the AMPT(t) MSE. 

Proposition 4.2. Assume the validity of our calibration relation i.e. the equivalence of formal 
operating characteristics of AMPT '(r) and LASSO(A(r)). Then 

M*(6,p)=M\S,p). 

Also, for A* as defined in Propositions) 



M*(S,p) = sup fMSE(z^, A*(z^; 5, p, <r)|LASSO). 

In words, A* is the maximin penalization and the maximin MSE of LASSOis precisely given by 
the formula (4.8). 

Proof. Taking the validity of our calibration relationship r <-> A(r) as given, we must have 

fMSE(>, A(r)|LASSO) = £MSE(i/, r|AMPT) . 



Our definition of A* in Proposition |3.1| is simply the calibration relation applied to the minimax 
AMPT threshold r*, i.e. A* = A(r*). Hence assuming the validity of our calibration relation, we 
have: 

sup iMSE(i/,A*(^;5, / 9,ct)|LASSO) = sup fMSE(i/, A(r*)| LASSO) 

= sup fMSE(i/,T*|AMPT) 
= sup inffMSE(i/,r|AMPT) (4.12) 
= sup inf fMSE(z/, A(r)|LASSO) 
= sup inf £MSE(i/,A| LASSO). 



Display (4.12) shows that all these equalities are equal to M (5, p). □ 



The proof of Proposition 3.1 points la, lb, lc follows immediately from the above. 
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4.5 Formal MSE above Phase Transition 

We now make an explicit construction showing that noise sensitivity is unbounded above PT. 

We first consider the AMPT algorithm above PT. Fix 5, p with p > /0 M se(<>) and set e = bp. 

In this section we focus on 3 point distributions with mass at equal to 1 — e. With an abuse of 
notation we let mse(/i, r) denote the MSE of scalar soft thresholding for amplitude of the non-zeros 
equal to /x, and noise variance equal to 1. In formulas, mse(/i, r) = mse(l; (1 — e)5o + + 
(e/2)£_ M ,T), and 

mse(/j, r) = (1 - e)E V (Z; r) 2 + eE(p- r)(p + Z; r)) 2 . 

Consider values of the AMPT threshold r such that mse(0, r) < 8; this will be possible for all r 
sufficiently large. Pick a number 7 E (0, 1) obeying 



1< 7 < mse(0,r)/<S. (4.13) 



Let M ± (e,r) = sup M mse(p, r) denote the worst case risk of rj(-;r) over the class T e . Let 
// (e, a,r) denote the a-least-favorable for threshold r: 

mse^,^ = (1 - a)M ± (e,r). 

Define a* = 1 — 7(5/M =t (e, r), and note that a* G (0, 1) by earlier assumptions. Let //* = p^ (a* , r, e) . 
A straightforward calculation along the lines of the previous section yields. 

Lemma 4.5. For the measure v = (1 — e)5o + (e/2)5^* + (e/2)5_ At *, £/te formal MSE and formal 
NPI are given by 

fMSE(V, t| AMPT) = -^L 

1-7 

fNPI(zy,r|AMPT) = ' 
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Assumption (4.13) permits us to choose 7 very close to 1. Hence the above formulas show 
explicitly that MSE is unbounded above phase transition. 

What do the formulas say about x 1,x above PT? The r's which can be associated to A obey 

< EqDR(z/, r) < 5, 

where EqDR(z^, r) = EqDR(r; 5, p, v, a) is the equilibrium detection rate for a signal with distribution 
v. Equivalently, they are those r where the equilibrium discovery number is n or smaller. 

Lemma 4.6. For each r > 0, obeying both 

mse(0,r) < 5 and EqDR(z/,r) < 5, 

the parameter A > defined by the calibration relation 

A(r) = -=L= ■ (1 - EqDR(i>, r)/S), 
V 1 -7 

has the formal MSE 

fMSE(^,r|LASSO) ^ 
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One can check that, for each A > 0, for each phase space point above phase transition, the above 
construction allows to construct a measure p, with e = 5p mass on nonzeros and with arbitrarily high 



formal MSE. This completes the derivation of part 2 of Proposition 3.1 
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1.31 


0.30 


0.50 


0.19 


0.10 


0.32 


1.15 


3.11 


0.90 


5.19 


1.15 


0.70 


0.50 


0.29 


0.14 


0.42 


1.00 


2.99 


2.55 


7.35 


1.00 


0.42 


0.50 


0.35 


0.17 


0.47 


0.92 


2.93 


7.51 


11.75 


0.92 


0.26 


0.50 


0.37 


0.18 


0.48 


0.90 


2.91 


15.75 


16.67 


0.90 


0.20 



Table 2: Parameters of quasi-Least-Favorable Settings studied in the empirical results presented 
here. 



5 Empirical Validation 

So far our discussion explains how state evolution calculations are carried out so others might repro- 
duce them. The actual 'science contribution' of our paper comes in showing that these calculations 



describe the actual behavior of solutions to (1.2). We check these calculations in two ways: first, 



to show that individual MSE predictions are accurate, and second, to show that the mathematical 
structures (least-favorable, minimax saddlepoint, maximin threshold) that lead to our predictions 
are visible in empirical results. 

5.1 Below phase transition 

Let fMSE(A; S, p, a, u) denote the formal MSE we assign to x 1 ' for problem instances from LSF(<5, p, a, v) 
Let eMSE(A) nj Ar denote the empirical MSE of the LASSO estimator in a problem instance drawn 
from LSF(<5, p, a, v) at a given problem size n,N. In claiming that the noise sensitivity of x 1,x is 
bounded above by M*(5,p), we are saying that in empirical trials, the ratio eMSE/c 2 will not be 
larger than M* with statistical significance. We now present empirical evidence for this claim. 

5.1.1 Accuracy of MSE at the LF signal 

We first consider the accuracy of theoretical predictions at the nearly-least-favorable signals generated 

If 



3.1 



by n, P ,a = (1 - e)^o + (£/2)$-^(6,p, a ) + (e/2)<S M .(s )P , a ) defined by Part 2.6 of Proposition _ 
the empirical ratio eMSE/cr 2 is substantially above the theoretical bound M*(5,p), according to 
standards of statistical significance, we have falsified the proposition. 

We consider parameter points 5 £ {0.10,0.25,0.50} and p £ {\ ■ p M sE, | "Pmse, jo ' Pmse, g§ ■ Pmse}- 
The predictions of the SE formalism are detailed in Table [2} 



23 



5 


p 




A* 


fMSE 


eMSE 


SE 


0.100 


0.095 


5.791 


1.258 


0.136 


0.126 


0.0029 


0.100 


0.142 


8.242 


0.804 


0.380 


0.329 


0.0106 


0.100 


0.170 


12.901 


0.465 


1.045 


0.755 


0.0328 


0.100 


0.180 


18.278 


0.338 


2.063 


1.263 


0.0860 


0.250 


0.134 


5.459 


0.961 


0.374 


0.373 


0.0046 


0.250 


0.201 


7.683 


0.592 


1.028 


1.002 


0.0170 


0.250 


0.241 


12.219 


0.351 


2.830 


2.927 


0.0733 


0.250 


0.254 


17.314 


0.244 


5.576 


5.169 


0.1978 


0.500 


0.193 


5.194 


0.689 


0.853 


0.836 


0.0078 


0.500 


0.289 


7.354 


0.400 


2.329 


2.251 


0.0254 


0.500 


0.347 


11.746 


0.231 


6.365 


6.403 


0.1157 


0.500 


0.366 


16.667 


0.159 


12.427 


11.580 


0.2999 



Table 3: Results at N = 1500. MSE of LASSO(A*) at nearly-least-favorable situations, together 
with standard errors (SE) 



Results at N = 1500 

To test these predictions, we generate in each situation R = 200 random realizations of size iV = 1500 
from LSF(<5, p, a, u) with the parameters shown in Tableland run the LARS/LASSO solver to find 
the solution Table [3] shows the empirical average MSE in 200 trials at each tested situation. 

Except at 5 = 0.10 the mismatch between empirical and theoretical a few to several percent - 
reasonable given the sample size R = 200. At 5 = 0.10, p = 0.180 - close to phase transition - there 
is a mismatch needing attention. (In fact, at each level of 5 the most serious mismatch is at the 
value of p closest to phase transition. This can be attributed partially to the blowup of the quantity 
being measured as we approach phase transition.) We will pursue this mismatch below. 

We also ran trials at 6 G {0.15,0.20,0.30,0.35,0.40,0.45}. These cases exhibited the same pat- 
terns seen above, with adequate fit except at small 5, especially near phase transition. We omit the 
data here. 

In all our trials, we measured numerous observables - not only the MSE. The trend in mis- 
match between theory and observation in such observables was comparable to that seen for MSE. In 
[DMM09b, DMMlOb], the reader can find discussion and presentation of evidence for other observ- 
ables. 

Results at N = 4000 

Statistics of random sampling dictate that there always be some measure of disagreement between 
empirical averages and expectations. When the expectations are taken in the large-system limit, as 
ours are, there are additional small- iV effects that appear separate from random sampling effects. 
However, both sorts of effects should visibly decline with increasing N. 

Table [4] presents results for N = 4000; we expect the discrepancies to shrink when the experiments 
are run at larger value of N. We study the same p and 5 that were studied for N = 1500, and see 
that the mismatches in our MSE's have grown smaller with N. 
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5 


p 




A* 


fMSE 


eMSE 


SE 


0.100 


0.095 


5.791 


1.258 


0.136 


0.128 


0.0016 


0.100 


0.142 


8.242 


0.804 


0.380 


0.348 


0.0064 


0.100 


0.170 


12.901 


0.465 


1.045 


0.950 


0.0228 


0.100 


0.180 


18.278 


0.338 


2.063 


1.588 


0.0619 


0.250 


0.134 


5.459 


0.961 


0.374 


.371 


0.0028 


0.250 


0.201 


7.683 


0.592 


1.028 


1.023 


0.0106 


0.250 


0.241 


12.219 


0.351 


2.830 


2.703 


0.0448 


0.250 


0.254 


17.314 


0.244 


5.576 


5.619 


0.0428 


0.500 


0.193 


5.194 


0.689 


0.853 


0.849 


0.0047 


0.500 


0.289 


7.354 


0.400 


2.329 


2.296 


0.016 


0.500 


0.347 


11.746 


0.231 


6.365 


6.237 


0.0677 


0.500 


0.366 


16.667 


0.159 


12.427 


12.394 


0.171 



Table 4: Results at N = 4000. Theoretical and empirical MSE's of LASSO(A*) at nearly-least- 
favorable situations, together with standard errors (SE). 



5 


p 


P 


A* 


fMSE 


eMSE 


SE 


0.100 


0.095 


5.791 


1.258 


0.136 


0.131 


0.0012 


0.100 


0.142 


8.242 


0.804 


0.380 


0.378 


0.0046 


0.100 


0.170 


12.901 


0.465 


1.045 


1.024 


0.0186 


0.100 


0.180 


18.278 


0.338 


2.063 


1.883 


0.0458 



Table 5: Results at N = 8000. Theoretical and empirical MSE's of LASSO(A*) at nearly-least- 
favorable situations with 5 = 0.10, together with standard errors (SE) of the empirical MSE's 



Results at N = 8000 

Small values of S have the largest discrepancy specially when p is chosen very close to the phase 
transition curve. To show that this discrepancy shrinks as we increase the value of JV, we do a 
similar experiment for 5 = 0.10 but this time with N = 8000. Table [5] summarizes the results of this 
simulation and shows better agreement between the formal predictions and empirical results. 

The alert reader will no doubt have noticed that the discrepancy between theoretical predictions 
and empirical results is in many cases quite a bit larger in magnitude than the size of the the formal 
standard errors reported in the above tables. We emphasize that the theoretical predictions are 
formal limits for the N — > oo case, while empirical results take place at finite N. In both statistics 
and statistical physics it is quite common for mismatches between finite- N results and A-large to 
occur as either 0{N~ l l 2 ) (eg Normal approximation to the Poisson) or 0(N^ 1 ) effects (eg Normal 
approximation to fair coin tossing) . Analogously, we might anticipate that mismatches in this setting 
of order N~ a with a either 1/2 or 1. Figure [9] presents empirical and theoretical results taken from 
the cases N = 1500, 4000, and 8000 and displays them on a common graph, with y-axis a mean- 
squared error (empirical or theoretical) and on the x axis the inverse system size 1/N. The case 
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Figure 9: Finite-TV scaling of empirical MSE. Empirical MSE results from the cases ./V = 1500, 
N = 4000 and N = 8000 and 5 = 0.1. Vertical axis: empirical MSE. Horizontal axis: 1/N. Different 
colors/symbols indicate different values of the sparsity control parameter 5. Vertical bars denote 
±2SE limits. Theoretical predictions for the N = oo case appear at 1/N = 0. Lines connect the 
cases N = 1500 and N = oo. 



1/N = presents the formal large-system limit predicted by our calculations and the other cases 
1/N > present empirical results described in the tables above. As can be seen, the discrepancy 
between formal MSE and empirical MSE tends to zero linearly with 1/N. (A similar plot with 1/y/N 
on the x-axis would not be so convincing.) 

Finding 5.1. The formal and empirical MSE's at the quasi saddlepoint (z/*,A*) show statistical 
agreement at the cases studied, in the sense that either the MSE's are consistent with standard 
statistical sampling formulas, or, where they were not consistent at N = 1500, fresh data at N = 4000 
and N = 8000 showed marked reductions in the anomalies confirming that the anomalies decline with 
increasing N. 



5.1.2 Existence of Game-Theoretic Saddlepoint in eMSE 

Underlying our derivations of minimax formal MSE is a game-theoretic saddlepoint structure, illus- 
trated in Figure 10 The loss function MSE has the following structure around the quasi saddlepoint 
(V*,A*): any variation of \i to lower values, will cause a reduction in loss, while a variation of A to 
other values will cause an increase in loss. 
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M[|a,X ] — Vary |i 



M[|»(a,p,0.01) ,X] — Vary*. 





Figure 10: Saddlepoint in formal MSE. Right panel: Behavior of formal MSE as A is varied away 
from A* . Left panel: Behavior of formal MSE as p is varied away from p* in the direction of smaller 
values. Black lines indicate locations of p* and A*. 6 = 0.25, p = p MSE (5)/2. 



5.1.3 Other penalization gives larger MSE 

If our formalism is correct in deriving optimal penalization for we will see that changes of the 
penalization away from A* will cause MSE to increase. We consider the same situations as earlier, 
but now vary A away from the minimax value, while holding the other aspects of the problem fixed. 
In the Appendix, Tables [7] and [8] presents numerical values of the empirical MSE obtained. Note the 
agreement of formal MSE, in which a saddlepoint is rigorously proven, and empirical MSE, which 
represents actual LARS/LASSO reconstructions. Also in this case we used R = 200 Monte Carlo 
replications. 



To visualize the information in those tables, we refer to Figure 11 



5.1.4 MSE with more favorable measures is smaller 

In our formalism, fixing A = A*, and varying p to smaller values will cause a reduction in formal 
MSE. Namely, if instead of p*(S, p, 0.01) we used p* (<5, p, a) for a significantly larger than 0.01, we 
would see a significant reduction in MSE, by an amount matching the predicted amount. 

Recall that mse(i/, r) denotes the 'risk' (MSE) of scalar soft thresholding as in Section [2j with 
input distribution v, noise variance 1, and threshold r. Now suppose that mse(fo,r) > mse(^i,r). 
Then also the resulting formal noise-plus-interference obeys iNPI(z/o,t) > fNPI(i/i,r). As noticed 



several times in Section 4.4 the formal MSE of AMPT obeys fMSE(z^,r) = mse(z>,r) • fNPI(z/,r) 



where v denotes a rescaled probability measure (as in the proof of Proposition 4.1). Hence 

fMSE(i/i,r) < mse(i7i,T)-fNPI(i/ ,T), 
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N=1500 N=4000 




Theoretical MSE Theoretical MSE 

Figure 11: Scatterplots comparing Theoretical and Empirical MSE's found in Tables [7] and [8j Left 
Panel: results at N = 1500. Right Panel: results at N = 4000. Note visible tightening of the scatter 
around the identity line as N increases. 



where the scaling uses fNPI(z^o). In particular, for p = p*(5,p,a) = p^(S ■ p, a)y / NPI*((5, p), the 
three point mixture: i^s,p,a nas 

fMSE( I / <5jPjQ ,r*) < (l-a)M*(6,p), 

and we ought to be able to see this. Table[9]shows results of simulations at N = 1500. The theoretical 
MSE drops as we move away from the nearly least favorable p in the direction of smaller p, and the 
empirical MSE responds similarly. 

Finding 5.2. The empirical data exhibit the saddlepoint structures predicted by the SE formalism. 
5.1.5 MSE of Mixtures 

The SE formalism contains a basic mathematical structure which allows one to infer that behavior at 
one saddlepoint determines the global minimax value: behavior under taking convex combinations 
(mixtures) of measures v. 

Let mse(i^, A) denote the 'risk' (MSE) of scalar soft thresholding as in Section 2. For such scalar 
thresholding, we have the affine relation 

mse((l - 7)1/0 + 7^1, t) = (1 - 7)mse(z/ , r) + 7 • mse(^i,r) . 

Now suppose that mse(fo,T) > mse(^i,r). Then also NPI(i/o,r) > NPI^,?-). The formal MSE of 
AMPT obeys the scaling relation fMSE(i^, r) = mse(P,r) • NPI(i^, r), where v denotes the rescaled 
probability measure, argument rescaled by 1/yNPI. We conclude that 

fMSE((l - 7)^0 + jvi,r) < (1 - 7) • mse(i> , r) • NPI(i/ , t) + 7 • mse(£i, r) ■ NPI(i/ , r), (5.1) 
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M[v,X ] — 3 & 5 point mixtures 



— 
1.5 




- 3-Point Mixture 

- 5-Point Mixture 

- UpperBound 

- n(6,p,0.01) 
H(6,p,0.50) 



Figure 12: Convexity structures in formal MSE. Behavior of formal MSE of 5 point mixture combining 
nearly least-favorable p with discount of 1% and one with discount of 50%. Also, the convexity bound 
(5.1) and the formal MSE of associated 3-point mixtures is displayed. 5 = 0.25, p = p MSE (5)/2. 



This 'quasi- affinity' relation allows to extend the saddlepoint structure from 3 point mixtures to 
more general measures. 

To check this, we consider two near-least-favorable measures, vq = ^5^,0.02 and v\ = ^<5 i/9j o.50- 
and generate a range of cases = (1 — a)uo + av\ by varying alpha. When a {0, 1} this 
is a 5 point mixture rather than one of the 3-point mixtures we have been studying. Figure [12] 



displays the convexity bound (5.1 ), and the behavior of the formal MSE of this 5 point mixture. For 
comparison it also presents the formal MSE of the 3 point mixture having its mass at the weighted 
mean (1 — a)p(5, p, 0.02) + ap(5, p, 0.50). Evidently, the 5 point mixture typically has smaller MSE 
than the comparable 3-point mixture, and it always is below the convexity bound. 

Finding 5.3. The empirical MSE obeys the mixture inequalities predicted by the SE formalism. 



5.2 Above Phase Transition 

We conducted an empirical study of the formulas derived in Section 4.5. At 5 = 0.25 we chose 
p = 0.401 - well above phase transition - and selected a range of r and 7 values allowed by our 
formalism. For each pair 7, r, we generated R = 200 Monte Carlo realizations and obtained LASSO 
solutions with the given penalization parameter A. The results are described in Table [6} The match 
between formal MSE and empirical MSE is acceptable. 

Finding 5.4. Running x 1 '^ at the 3-point mixtures defined for the regime above phase transition in 
Lemma \4-o] yields empirical MSE consistent with the formulas of that Lemma. 

This validates the unboundedness of MSE of LASSO above phase transition. 
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6 Extensions 



6.1 Positivity Constraints 

A completely parallel treatment can be given for the case where x° > 0. In that setting, we use the 
positivity-constrained soft-threshold 

+ , Q s ( x - 9 if 9 < x, . . 

V + W) = { i{x < d: (6-1) 

and consider the corresponding positive-constrained thresholding minimax MSE [DJHS92J 

M + (e) = inf sup e{ [rj + (X + a ■ Z; to) - A] 2 }, (6.2) 

where 

= {v : v is probability measure with z/[0,oo) = 1,^({0}) > 1 — e}. 

We consider the positive-constrained ^-penalized least-squares estimator the solution to 

(P2X1) minimizea;>o - \\y — Ax\\\ + A||x||i. (6-3) 
We define the minimax, formal noise sensitivity: 

M + '*(5,p) = supmaxminfMSE(x 1 ' A ' + ,^,a 2 )/cj 2 ; (6.4) 
CT >o v A 

here v € J-^ s is the marginal distribution of Xq. Let Pmse($) denote the solution of 

M + (p5) = 5. (6.5) 



In complete analogy to (1.7) we have the formula: 



( M+(6p) + / r\ 

M + '*(5,p) = I i-M+(8 P )/S> P <• Pmse{0), 
1 OO, p > PmseW- 



(6.6) 



The argument is the same as above, using the AMP formalism, with obvious modifications. 
The papers [DMM09a, DMM09b] show in more detail how to make arguments for AMP that apply 
simultaneously to the sign-constrained and unconstrained case. All other features of Proposition 



3.1 carry over, with obvious substitutions. Figure 13 shows the phase transition for the positivity 



constrained case, as well as the contour lines of M + '*. Again in analogy to the sign- unconstrained 
case, the phase boundary p\ ISE occurs at precisely the same location at the phase boundary for 
equivalence; as earlier this can be inferred from formulas in this paper and in [DMM09aj. 



6.2 Other Classes of Matrices 

We focused here on matrices A with Gaussian iid entries. 

Previously, extensive empirical evidence was presented by Donoho and Tanner [DT09 , that pure 
£i-minimization has its £\-£o equivalence phase transition at the boundary Pmse n ot only for Gaussian 
matrices but for a wide collection of ensembles, including partial Fourier, partial Hadamard, expander 
graphs, iid ±1. This is the noiseless, A = case of the general noisy, A > case studied here. 

We believe that similar results to those obtained here hold for matrices A with uniformly bounded 
iid entries with zero mean and variance 1/n. In fact, we believe our results should extend to a 
broader universality class including matrices with iid entries with same mean and variance, under 
an appropriate light tail condition. 
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Contours of log(M ,+ (6,p)) and p!*\ {&) (dotted) 



0.9 - 



0.8 - 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

5 



Figure 13: Contour lines of the positivity-constrained minimax noise sensitivity M*' + (6, p) in the (p, ft) 
plane. The dotted black curve graphs the phase boundary (<5, Pmsb(^))- Above this curve, M* ,+ (S, p) = oo. 
The colored lines present level sets of M*' + (S, p) = 1/8, 1/4, 1/2, 1, 2, 4 (from bottom to top). 



7 Relations with Statistical Physics and Information Theory 

This section outlines the relations of the approach advocated here with ideas in information theory 
(in particular, with the theory of sparse graph codes), graphical models and statistical physics (more 
precisely spin glass theory). We will not discuss such relations in full mathematical detail, but only 
stress some important points that might be useful for researchers in each of those fields. 

7.1 Information theory and message passing algorithms 

Message passing algorithms, and most notably belief propagation, have been intensively investigated 
in coding theory and communications, in particular because of their success in decoding sparse 
graph codes [RU08J. Belief propagation is defined whenever the a posteriori joint distribution of the 
variables to be inferred x conditional on the observations y can be written as a graphical model. 
In the present case this is easily done, provided the a priori probability distribution of the signal 
x = (xi, . . . ,xn) takes a u = v\ x vi ■ ■ ■ x v^. The posterior is then 

1 n 8 N 

K dx ) = 7F II eX p{ ~ 2 ^ Va ~ ( Ax )a) 2 }nVi{&Xi) . 
a=l i=l 

Graphical models of this type were (implicitly or explicitly) considered in the context of multiuser 
detection |Kab03l IJNS051 IMPT06L IMT06j . The underlying factor graph jKFLOlj is the complete 
bipartite graph over N variable nodes and n factor nodes. 
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Applying belief propagation to such a model incurs two obvious difficulties: the graph is dense 
(and hence the complexity per iteration scales at least like n 3 , and in fact worse), and the alphabet 
is continuous (and hence messages are not finitely represent able). As discussed in [DMMlOaJ, AMP 



solves these problem. From the information theoretical perspective, the term +z* _1 dfj/n in Eq. (4.1 ) 
corresponds to 'subtracting intrinsic information'. 

An important difference between the message passing algorithms in coding theory and what is 



presented here is that no precise information is available on the priors U{ in Eq. (7.1). Therefore the 
AMP rules should not be sensitive to the prior. The use of the soft threshold function 77 ( • ; 9) makes 
the AMP robust within the class of sparse priors. Also, it is directly related to the l\ regularization 
in the LASSO. 

In coding theory, message passing algorithms are analyzed through density evolution [RU08J. 
The common justification for density evolution is that the underlying graph is random and sparse, 
and hence converges locally to a tree in the large system limit. In the case of trees density evolution 
is exact, hence it is asymptotically exact for sparse random graphs. Such an easy justification is not 
available in the cases of dense graphs treated here and a deeper mathematical analysis is required. 
In |BM10j . this analysis was carried out in the case of Gaussian matrices A. It remains a challenge 
to generalize such analysis beyond the case of Gaussian matrices A. 

Having outlined the relation with belief propagation and coding, it is important to clarify a key 
point. In the context of sparse graph coding, belief propagation performances and MAP (maximum 
a posteriori probability) performances do not generally coincide even asymptotically (although they 
are intimately related [MMU04, MMRU09]). In the present paper we instead conjecture that AMP 
and LASSO have asymptotically equal MSE under appropriate calibration. This is due to the fact 
that that the state evolution recursion mt 1— > mt+i = ^>(mt) has only one stable fixed point. 

7.2 Statistical physics 

There is a well studied connection between statistical physics techniques and message passing algo- 
rithms [MM09]. In particular, the sum-product algorithm corresponds to the Bethe-Peierls approxi- 
mation in statistical physics, and its fixed points are stationary points of the Bethe free energy. In 
the context of spin glass theory, the Bethe-Peierls approximation is also referred to as the 'replica 
symmetric cavitjj^] method'. 

The Bethe-Peierls approximation postulates a set of non-linear equations on quantities that 
correspond to the belief propagation messages, and allow to compute posterior marginals under 



the distribution (7.1). In the special cases of spin glasses on the complete graph (the celebrated 
Sherrington-Kirkpatrick model), these equations reduce to the so-called TAP equations, named after 
Thouless, Anderson and Palmer who first used them [TAP 77] . 

The original TAP equations where a set of non-linear equations for local magnetizations (i.e. 
expectations of a single variable). Thouless, Anderson and Palmer first recognized that naive mean 
field is not accurate enough in the spin glass model, and corrected it by adding the so called Onsager 



reaction term that is analogous to the term dfj/n in Eq. (4.1). More than 30 years after the 

original paper, a complete mathematical justification of the TAP equations remains an open problem 
in spin glass theory, although important partial results exist [Tal03]. While the connection between 
belief propagation and Bethe-Peierls approximation stimulated a considerable amount of research 
[YFW05J, the algorithmic uses of TAP equations have received only sparse attention. Remarkable 



3 When this terminology is used in statistical physics, the emphasis is rather on properties of random instances. 
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exceptions include [OW011 iKabMl lNS05j. 



7.3 State evolution and replica calculations 



Within statistical mechanics, the typical properties of probability measures of the form (7.1) are 
studied using the replica method or the cavity method [MMQ9]. These can be described as non- 
rigorous but mathematically sophisticated techniques. Despite intense efforts and some spectacular 
progresses }Tal03j , even a precise statement of the assumptions implicit in such techniques is missing, 
in a general setting. 

The fixed points of state evolution describe the output of the corresponding AMP, when the 
latter is run for a sufficiently large number of iterations (independent of the dimensions n, N). It is 
well known, within statistical mechanics [MM09] . that the fixed point equations do indeed coincide 
with the equations obtained form the replica method (in its replica-symmetric form). 

During the last few months, several papers investigated compressed sensing problems using the 
replica method [RFG09, KWT09, GBS09 . In view of the discussion above, it is not surprising that 
these results can be recovered from the state evolution formalism put forward in [DMM09aj. Let us 
mention that the latter has several advantages over the replica method: 

(1) It is more concrete, and its assumptions can be checked quantitatively through simulations; 

(2) It is intimately related to efficient message passing algorithms; 

(3) It actually allows to predict the performances of these algorithms (including for instance precise 
convergence time estimates); 

(4) It actually leads to rigorous statements, at least in the case of Gaussian sensing matrices. 

A Some explicit formulae 

This appendix contain some formulae and analytical derivation omitted from the main text. 
The phase boundary curve admits the parametric expression 



2<Kt) 

t + 2(0(r) - r*(-r)) ' 
i _ t$(-t) 



T 



(A.l) 

(A.2) 
(A.3) 



This is simply obtained from Eq. (2.5). If we call G e (r) the function on the right hand side, then 
the parametric expression given here follows from 5 = G e (r) and G' £ (t) = (which are equivalent to 
5 = M ± (e)). 

B Tables 

This appendix contains table of empirical results supporting our claims. 
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Table 6: Results above Phase transition. Parameters of the construction as well as theoretical 
predictions and resulting empirical MSE figures 
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Table 7: N = 1500, A dependence of the MSE at fixed \x 
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Table 8: N = 4000, A dependence of the MSE at fixed \i 



5 


p 




A 


fMSE 


eMSE 


SE 


0.100 


0.095 


5.791 


0.402 


0.152 


0.144 


0.0017 


0.100 


0.095 


5.791 


1.258 


0.136 


0.128 


0.0016 


0.100 


0.095 


5.791 


2.037 


0.142 


0.133 


0.0016 


U. 1UU 


n no^ 
u.uyo 


^ 7Q1 


o. ioy 


n 1 7/1 

U. 1 / 4 


n i fia 

U.100 


n nm p. 

U.UU10 


n 1 nn 


u.uyo 


^ 7Q1 

o. t y i 


A Q/L8 


n 9^q 
u.zoy 


n 998 
u.zzo 


n nm 9 

U.UU1Z 


n 1 nn 

U. 1UU 


n I/in 
U. 14Z 


8 9/19 
O.Z4tZ 


n an/i 

U.OU41 


n ^an 

U.OOU 


n ^/la 


n nnfi/i 

U.UU04t 


n 1 nn 

U. 1UU 


n i ao 

U. 14Z 


O.Z4tZ 


1 Qfin 
i.you 


n /ina 

U.4U0 


n ^ao 
u.ooy 


n nn^a 
u.uuoo 


n 1 nn 

U. 1UU 


n i ao 

U. 14Z 


8 9/19 
O.Z4tZ 


^ 89/1 


n k^a 


U.01U 


n nn^i 

U.UUOl 


1 00 


n 149 


8 949 


6 865 


717 


71 6 


0014 


n 1 nn 

U. 1UU 


n 1 7n 

U. 1 ( u 


1 9 anfi 
iz.yuo 


n /if;^ 

U.400 


i n/K 

1.U410 


n Q^n 
u.you 


n n998 
u.uzzo 


n 1 nn 

U. 1UU 


n 1 7n 

U. 1 ( u 


iz.yuo 


9 9Q8 

z.zyo 


1 1 78 
1.1(0 


1111 
1.111 


n n9 ,; !9 

U.UZoZ 


n 1 nn 

U. 1UU 


n 1 7n 

U. 1 f U 


iz.yuo 


0.41O1 


1 fil Q 

i.oiy 


1 ^01 

i.oyi 


n ni 
u.uioy 


1 00 


1 70 


1 2 906 


1 607 


2 197 


2.182 


008 


n 1 nn 

U. 1UU 


U. 10U 


1 8 978 
15. Z 1 O 


n "^a 


9 nfi*} 

Z.UOo 


1 ^88 
1 .000 


n nfii q 
u.uoiy 


n 1 nn 

U.1UU 


n i an 

U. 10U 


1 8 978 
10. Z 1 


9 

z.yo4 


9 /IR7 
Z.40 ( 


9 1 71 
Z. 1 ( 1 


n n^"?9 

U.UOoZ 


n 1 nn 

U. 1UU 


n i an 

U. 10U 


1 8 978 
10. Z ( 


7 C/iC 


^ /1 7/1 

0.41: ( 41 


O.OO ( 


n n^i 9 
u.uoiz 


1 00 


180 


1 8 978 


14 QQ7 

I'd:, ft? 1 


4 677 


4 551 


01 69 


0.150 


0.109 


5.631 


0.420 


0.236 


0.228 


0.0022 


0.150 


0.109 


5.631 


1.073 


0.212 


0.209 


0.0023 


0.150 


0.109 


5.631 


1.700 


0.218 


0.213 


0.0021 


n 1 ^n 

U. 10U 


n i no 
u. iuy 


O.OOl 


9 fi^7 
Z.00 ( 


n 9f;n 

U.ZOU 


n 9^1 

U.ZOl 


n nn9/i 

U.UUZ4 


0.150 


0.109 


5.631 


4.284 


0.359 


0.353 


0.0017 


0.150 


0.163 


8.030 


0.720 


0.588 


0.595 


0.0072 


0.150 


0.163 


8.030 


1.614 


0.626 


0.610 


0.0078 


0.150 


0.163 


8.030 


3.135 


0.804 


0.807 


0.0058 


0.150 


0.163 


8.030 


5.868 


1.125 


1.118 


0.0047 


0.150 


0.196 


12.577 


0.434 


1.612 


1.572 


0.0341 


0.150 


0.196 


12.577 


1.814 


1.792 


1.720 


0.0281 


0.150 


0.196 


12.577 


4.339 


2.433 


2.383 


0.0205 


0.150 


0.196 


12.577 


8.903 


3.359 


3.333 


0.0126 


0.150 


0.207 


17.814 


0.305 


3.185 


2.864 


0.0861 


0.150 


0.207 


17.814 


2.231 


3.715 


3.582 


0.0722 


0.150 


0.207 


17.814 


5.879 


5.202 


5.141 


0.0439 


0.150 


0.207 


17.814 


12.455 


7.142 


7.154 


0.0269 



38 



Table 9: N = 1500, fi dependence of the MSE at fixed A 
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n 9^n 
u.zou 


U. 104 


o.4oy 


n Qfi/i 
u.yo4 


n ^87 

U.OO 1 


n ^87 

U.Oo ( 


n nn^a 

U.UUOo 


n 9^n 
u.zou 


n 9m 

U.ZUl 


O.Ooo 


n ^87 
U.Oo I 


n qqq 
u.yoy 


n SQQ 

u.oyy 


n m 9fi 

U.U1Z0 


n 9^n 
u.zou 


n 9m 

U.ZUl 


f . loo 


u.oyu 


n QS8 

u.yoo 


u.yoo 


n ni a 7 

U.U14 1 


n 9^n 
u.zou 


n 9m 

U.ZUl 


7 /IQ^? 
1 .400 


n ^qi 
u.oy i 


1 nno. 
i.uuy 


u.yoo 


n ni a 7 

U.U14 1 


n 9^n 


D 9D1 


7 ^81 


n c iQ9 


1 091 

± . UZ -L 


1 D97 


o ni ^ 


0.500 


0.193 


4.194 


0.684 


0.769 


0.770 


0.0052 


0.500 


0.193 


4.694 


0.687 


0.818 


0.823 


0.0066 


0.500 


0.193 


4.944 


0.688 


0.837 


0.838 


0.0073 


0.500 


0.193 


5.094 


0.689 


0.847 


0.835 


0.0068 


0.500 


0.193 


5.194 


0.689 


0.853 


0.834 


0.0073 


0.500 


0.193 


5.294 


0.689 


0.858 


0.845 


0.0079 


0.500 


0.193 


5.444 


0.690 


0.865 


0.863 


0.0079 


0.500 


0.193 


5.694 


0.690 


0.874 


0.887 


0.0085 


U.OUU 


n 1 qq 
U.lyo 


b. Iy4 


U.byl 


U.ooo 


U.ooo 


U.UUoO 


0.500 


0.289 


6.354 


0.398 


2.119 


2.071 


0.0195 


0.500 


0.289 


6.854 


0.399 


2.234 


2.214 


0.0235 


0.500 


0.289 


7.104 


0.399 


2.284 


2.157 


0.0252 


0.500 


0.289 


7.254 


0.400 


2.313 


2.271 


0.0244 


0.500 


0.289 


7.354 


0.400 


2.329 


2.316 


0.0275 


0.500 


0.289 


7.454 


0.400 


2.346 


2.287 


0.0287 


0.500 


0.289 


7.604 


0.400 


2.370 


2.327 


0.0306 


0.500 


0.289 


7.854 


0.401 


2.404 


2.339 


0.0284 


0.500 


0.289 


8.000 


0.401 


2.422 


2.409 


0.0300 



39 



Table 10: N = 1500, MSE for 5-point prior 



5 


P 


A 4 


A 


Theoretical MSE 


Empirical MSE 


a 


0.250 


0.134 


1.894 


0.857 


0.120 


0.151 





0.250 


0.134 


2.171 


0.897 


0.162 


0.163 


0.122 


0.250 


0.134 


2.447 


0.901 


0.178 


0.177 


0.244 


0.250 


0.134 


2.724 


0.906 


0.196 


0.195 


0.366 


0.250 


0.134 


3.001 


0.912 


0.215 


0.210 


0.488 


0.250 


0.134 


3.277 


0.918 


0.237 


0.236 


0.611 


0.250 


0.134 


3.554 


0.926 


0.261 


0.257 


0.7333 


0.250 


0.134 


3.830 


0.935 


0.287 


0.280 


0.8556 


0.250 


0.134 


4.107 


0.945 


0.317 


0.307 


0.9778 


0.250 


0.134 


4.383 


0.957 


0.348 


0.359 


1.1000 
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